Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts
نویسندگان
چکیده
This paper proposes a sentence selection method using a maximum entropy criterion to construct recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, phonetic and prosodic contextual balance is important as well as the coverage. To take account of both of the phonetic and prosodic contextual balance in the sentence selection, we introduce and maximize the entropy of the phonetic and prosodic contexts, such as biphone, triphone, accent, and sentence length. The objective experimental results show that the proposed method achieves better coverage and balance of contexts and reduces spectral and F0 distortions compared to the random and coverage-based sentence selection methods.
منابع مشابه
Unit Selection Speech Synthesis Using Phonetic-Prosodic Description of Speech Databases
This paper describes an approach to speech synthesis based on using speech databases at different stages of TTS process. Speech database units are phones in different segmental and prosodic contexts. Pitch synchronous segmentation and labeling of databases allows storing both segmental and prosodic information. Phonetic-prosodic annotations of speech databases are involved in off-line training ...
متن کاملCorpus Creation for Polish Unit Selection Speech Synthesis
This paper describes the process of creating speech corpus for Polish Unit Selection speech synthesis. This task is time-consuming and manually designing the corpus is, in practice, only applicable in Limited Domain Speech Synthesis and Recognition. The sentence selection tools used while designing the corpus are usually based on the Greedy algorithm. The algorithm looks for sentences which cov...
متن کاملOn building phonetically and prosodically rich speech corpus for text-to-speech synthesis
This paper proposes a way of preparing and recording a speech corpus for unit selection text-to-speech speech synthesis driven by symbolic prosody. The research is focused on a phonetically and prosodically rich sentence selection algorithm. Symbolic description on a deep prosody level is used to enrich the phonetic representation of sentences (by respecting the prosodeme types phones appear in...
متن کاملCreation and analysis of a Polish speech database for use in unit selection synthesis
The main aim of this study is to describe the process of creating a speech database to be used in corpus based text-to-speech synthesis. To help achieve natural sounding speech synthesis, the database construction was aimed at rich phonetic and prosodic coverage based on variable length units (phoneme, diphone, triphone) from different phonetic and prosodic contexts. Following previous work on ...
متن کاملDesign of a Mandarin Sentence Set for C by Use of a Multi-tier Algorithm Tak Prosodic and Spectral Ch
This paper presents a multi-tier algorithm to extract a sentence set from a large raw text corpus for synthesis of Mandarin speech, taking account of varied prosodic and spectral characteristics. The prosodic and spectral characteristics are statistically analyzed from the text corpus and transcribed as syllable-sized unit candidates in a multi-tier way. The unit candidates cover all of the syl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015